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^ Papers from Hotnets-ll: A case for run-time adaptation in packet processing systems Q 

Ravi Kokku, Taylor L. RIche, Aaron Kunze, Jayaram Mudigonda, Jamie Jason, Harrick M. Vin 
January 2004 ACM SIGCOMM Computer Communication Review, volume 34 issue i 

Publisher: ACM Press 

Full text available: ^ pdfd 77.92 KB) Additional Information: full citation , abstract, references 

Most packet processing applications receive and process multiple types of packets. Today, 
the processors available within packet processing systems are allocated to packet types at 
design time. In this paper, we explore the benefits and challenges of adapting allocations 
of processors to packet types In packet processing systems. We demonstrate that, for all 
the applications and traces considered, run-time adaptation can reduce energy 
consumption by 70—80% and processor provisioning level by 40 ... 



Trading packet headers for packet processing 

Girish P. Chandranmenon, George Varghese 

April 1996 IEEE/ACM Transactions on Networking (TON), volume 4 issue 2 
Publisher: IEEE Press 

Additional Information: full citation, references , citings , index terms . 

review 



Full text available: g |pdfn.41 MB) 
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Trading packet headers for packet processing 

Girish P. Chandranmenon, George Varghese 

October 1995 ACM SIGCOMM Computer Communication Review , Proceedings of tlie 

conference on Applications, teclinoiogies, architectures, and protocols 
for computer communication SIGCOMM *95, volume 25 issue 4 

Publisher: ACM Press 

Additional Information: full citation , abstract , references, citings, index 

terms 



Full text available: " Ppdfd.ZI MB) 



In high speed networks, packet processing is relatively expensive while bandwidth Is 
cheap. Thus it pays to add information to packet headers to make packet processing 
easier. While this is an old idea, we describe several specific new mechanisms based on 
this principle. We describe a new technique, source hashing, which can provide 0(1) 
lookup costs at the Data Link, Routing, and Transport layers. Source hashing Is especially 
powerful when combined with the old idea of a flow I ... 
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4 Packet processing architectures: A methodology for evaluating runtime support in Q 
^ network processors 

^ Xin Huang, Tilman Wolf 

December 2006 Proceedings of the 2006 ACM/IEEE symposium on Architecture for 

networicing and communications systems ANCS '06 

Publisher: ACM Press 

Full text available: "Pl pclfn.ys MB^ Additional Information: full citation , abstract , references, index terms 



Modem network processor systems require the ability to adapt their processing 
capabilities at runtinne to changes in network traffic. Traditionally, network processor 
applications have been optimized for a single static workload scenario, but recently 
several approaches for run-time adaptation have been proposed. Comparing these 
approaches and developing novel run-time support algorithms is difficult due to the 
multicore system-on-a-chip nature of network processors. In this paper, we present a ... 

Keywords: network processors, runtime management, workload partitioning and 
mapping 





Network processor architecture: Overcoming the memory wall in packet processing: Q 
hammers or ladders? 

Jayaram Mudigonda, Harrick M. Vin, Raj Yavatkar 

October 2005 Proceedings of the 2005 symposium on Architecture for networicing 

and communications systems ANCS '05 

Publisher: ACM Press ' 

Full text available: pdf(207.39 KB) Additional Information: full citation , abstract, references, index terms 

Overhead of memory accesses limits the performance of packet processing applications. 
To overcome this bottleneck, today's network processors can utilize a wide-range of 
mechanisms-such as multi-level menriory hierarchy, wide-word accesses, special-purpose 
result-caches, asynchronous memory, and hardware multi-threading. However, 
supporting all of these mechanisms complicates programmability and hardware design, 
and wastes systemresources. In this paper, we address the following fundamental 
questi ... 

Keywords: data-caches, multithreading, network processors 

Automatically partitioning packet processing applications for pipelined architectures ^ 

Jinquan Dai, Bo Huang, Long Li, Luddy Harrison 

June 2005 ACM SIGPLAN Notices , Proceedings of the 2005 ACi^ SIGPLAN conference 
on Programming language design and implementation PLDI '05, volume 40 

Issue 6 

Publisher: ACM Press 

Full text available: -pi pdf(541 .83 KB^ Additional Information: full citation , abstract, references , dtiogs. index 

^ terms 

Modern network processors employs parallel processing engines (PEs) to keep up with . 
explosive Internet packet processing demands. Most network processors further allow 
processing engines to be organized in a pipelined fashion to enable higher processing 
throughput and flexibility. In this paper, we present a novel program transformation 
technique to exploit parallel and pipelined computing power of modern network 
processors. Our proposed method automatically partitions a sequential packet proces ... 

Keywords: live-set transmission, network processor, packet processing, parallel, 
pipelining transformation, program partition 
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7 Design space exploration for embedded systems: A framework for evaluating design Q 
tradeoffs in packet processing architectures 

Lothar Thiele, Samarjit Chakraborty, Matthias Gries, Simon Kiinzii 
June 2002 Proceedings of the 39th conference on Design automation DAC '02 

Publisher: ACM Press 

Additional Information; full citation , abstract, references , citings , index 





Full text available: TO Ddf(327.67 KB) 

terms 

We present an analytical method to evaluate embedded network packet processor 
architectures, and to explore their design space. Our approach Is In contrast to those 
based on simulation, which tend to be infeasible when the design space is very large. We 
illustrate the feasibility of our method using a detailed case study. 

8 Packet processing architectures: High-throughput sketch update on a low-power Q 

stream processor 

Yu-Kuen Lai, Gregory T. Byrd 

December 2006 Proceedings of the 2006 ACM/IEEE symposium on Architecture for 

networking and communications systems ANCS '06 

Publisher: ACM Pi^ess 

Full text available: ^pdf(787>77 KB) • Additional Information: full citation , abstract , references , index terms 

Sketch algorithms are widely used for many networking applications, such as identifying 
frequent items, top-k flows, and traffic anomalies. This paper explores the implementation 
of the Count-Min sketch update using Indexed SRF accesses on a SIMD stream processor 
(Imagine). Both the sketch data structure and the packet stream are modeled as streams, 
and in-lane accesses to the stream register file (SRF) support concurrent updates without 
explicit synchronization. The 500-MHz stream processor is ... 

Keywords: SIMD, VLIW, data stream processing, network processors, sketch, stream 
architecture 



9 Packet processing architectures: Symerton-using virtualization to accelerate packet Q 
^ processing 

^ Aaron R. Kunze, Stephen D. Goglin, Erik J. Johnson 

December 2006 Proceedings of the 2006 ACM/IEEE symposium on Architecture for 

networlcing and communications systems ANCS '06 
Publisher: ACM Press 

Full text available: ^pdf(31Q.85 KB) Additional Information: full citation , abstract , references, index terms 

The complexity of packet-processing applications continues to grow, with encryption, 
compression, and XML processing becoming common on packet-processing devices at the 
edge of enterprise and service provider networks. While performance remains a key 
differentiator for these devices, the complexity and rate of change in the supported 
applications has made general-purpose platforms an attractive alternative to ASICs and 
network processors. General-purpose platforms offer excellent programmabillty ... 

Keywords: communications systems, networking, virtualization 



10 Flow management: Framework for supporting multi-service edge packet processing Q 
on network processors 

Arun Raghunath, Aaron Kunze, Erik J. Johnson, Vinod Balakrishnan 
October 2005 Proceedings of the 2005 symposium on Architecture for networlcing 

and communications systems ANCS '05 
Publisher: ACM Press 
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Full text available: ^ pdf(355.a4 KB) Additional Information: full citation , abstract, references , index ternis 

Network edge packet-processing systems, as are commonly Implemented on network 
processor platforms, are Increasingly required to support a rich set of services. These 
multi-service systems are also subjected to widely varying and unpredictable traffic. 
Current network processor systems do not simultaneously deal well with a variety of 
services and fluctuating workloads. For example, current methods of worst-case, static 
provisioning can meet performance requirements for any workload, but provisi ... 

Keywords: edge packet processing, network processors, run-time adaptation 




Managing memory access latency in packet processing Q 

Jayaram Mudigonda, Harrick M. Vin, Raj Yavatkar 

June 2005 ACM SIGMETRICS Performance Evaluation Review , Proceedings of tiie 
2005 ACM SIGMETRICS international conference on Measurement and 
modeling of computer systems SIGMETRICS '05, Volume 33 issue 1 
Publisher: ACM Press 

I- II * ui « ^f/^n CO i^D\ Additional Information: full citation , abstract, references , citings , index 

Full text available: If q pdf(69.62 KB^ ; 

Idd-*^— J terms 

In this study, we refute the popular belief [1,2] that packet processing does not benefit 
fronn data-caching. We show that a small data-cache of 8KB can bring down the packet 
processing time by much as 50-90%, while reducing the off-chip memory bandwidth 
usage by about 60-95%. We also show that, unlike general-purpose computing, packet 
processing, due to its memory-intensive nature, cannot rely exclusively on data-caching 
to eliminate the memory bottleneck completely. 

Keywords: data-caches, multithreading, network processors 




12 Packet processing architectures: An effective network processor design framework: Q 
using multi-objective evolutionary algorithms and object oriented techniques to 
optimise the intel IXP12Q0 network processor 

Liam Noonan, Colin Flanagan 

December 2006 Proceedings of the 2006 ACM/IEEE symposium on Architecture for 

netv\/orking and communications systems ANCS '06 

Publisher: ACM Press 

Full text available: S pdf(592.93 KB) Additional Information: full citation, abstract , references , index terms 



In this paper we present a frannework for design space exploration of a network 
processor, that Incorporates parameterisatlon, power and cost analysis. This method 
utilises nnulti-objective evolutionary algorithms and object oriented analysis and design. 
Using this approach an engineer specifies certain hard and soft performance requirements 
for a multi-processor system, and allows it to be generated automatically by competitive 
evolution/optimisation, thus obviating the need for detailed design. ... 

Keywords: design space exploration, evolutionary approaches, object oriented 

13 Reprogrammable network packet processing on the field programmable port Q 
^ extender (FPX) 

^ John W. Lockwood, Naji Naufel, Jon S. Turner, David E. Taylor 

February 2001 Proceedings of the 2001 ACM/SIGDA ninth international symposium 

on Field programmable gate arrays FPGA '01 
Publisher: ACM Press 

Additional Information: full citation , abstract, references , citings , index 
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Full text available: ^ pdf(257.98 KB) terms 

A prototype platform has been developed that allows processing of packets at the edge of 
a multl-gigabit-per-second network switch. This system, the Field Programmable Port 
Extender (FPX), enables packet processing functions to be implemented as modular 
components in reprogrammable hardware. All logic on the on the FPX Is implemented in 
two Field Programmable Gate Arrays (FPGAs). Packet processing functions in the system 
are implemented as dynamically-loadable modules. Core functi ... 

Keywords: ATM, FPGA, IP, Internet, hardware, modularity, network, packet, processing, 
reconfiguration, routing 



14 Session 38: communication-driven synthesis: Synthesis of high-performance packet Q 
^ processing pipelines 

^ Cristian Soviani, Ilija Hadzic, Stephen A. Edwards 

July 2006 Proceedings of the 43rd annual conference on Design automation DAC '06 

Publisher: ACM Press 

Full text available: ^ pdf(821 .60 KB) Additional Information: full citation, abstract , references , index terms 

Packet editing is a fundamental building block of data communication systems such as 
switches and routers. Circuits that implement this function are critical and define the 
features of the system. We propose a high-level synthesis technique for a new model for 
representing packet editing functions. Experiments show our circuits achieve a throughput 
of up to 40Gb/s on a commercially available FPGA device, equal to state-of-the-art 
implementations. 

Keywords: FPGAs, high-level synthesis, networking, packet processors 




Transport protocol processing at GBPS rates Q 

N. Jain, 1^. Schawrtz, T. Bashkow 

August 1990 ACM SIGCOMM Computer Communication Review , Proceedings of the 

ACM symposium on Communications architectures & protocols 

SIGCOMM '90, Volume 20 Issue 4 
Publisher: ACM Press 

* 

Full text available- l3!|_DdfCL55MBl Additional Information: full citation , abstract , references , citings , index 

. lid^^ * • terms 

This paper proposes an architecture for accomplishing transport protocol processing at 
Gbps rates. The limitations of currently used transport protocols have been analyzed 
extensively in recent literature. Several benchmark studies have established the 
achievable throughput of ISO TP4 and TCP to be in the low Mbps range; several new 
protocols and implementation techniques have been proposed to achieve 100 Mbps and 
higher throughput rates. We briefly review some of these protocols and estabi ... 

16 Software Processing Performance in Network Processors Q 

I. Papaefstathiou, G. Kornaros, N. Zervos 

February 2004 Proceedings of the conference on Design, automation and test in 

Europe - Volume 3 DATE '04 

Publisher: IEEE Computer Society 

Full text available: fi jpdf (126.22 KB^ Additional Information: full citation , abstract , index terms 



To meet the demand for higher performance, flexibility, and economy in today's state-of- 
the-art networks, an alternative to the ASICs that traditionally were used to implement 
packet-processing functions in hardware, called network processors (NPs), has emerged. 
In this paper, we briefly outline the architecture of such an Innovative network processor 
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aiming at tfie acceleration of protocol processing in high-speed network Interfaces, and we 
use thisarchitecture as a case study for our measuremen ... 

17 Improving network simulation: Considering processing cost in network simulations Q 

Ramaswamy Ramaswamy, Ning Weng, Tilman Wolf 

August 2003 Proceedings of the ACM SIGCOMM workshop on Models, methods and 

tools for reproducible network research MoMeTools '03 

Publisher: ACM Press 

Full text available: pdf(327.98 KB) Additional Information: full citation , abstract, references 

In many network simulations and models the cost of processing a pacl<et is considered 
negligible or overly simplified. The functionality of routers is steadily increasing and 
complex processing of packet payloads Is being implemented (deep packet classification, 
encryption, content transcoding). We show two examples where processing cost can 
contribute to a significant portion of the overall packet delay. To enable a more precise 
consideration of processing delay, we present a tool called NPEST { ... 

18 Session 7: embedded system techniques (2): Handling of packet dependencies: a Q 
critical issue for highly parallel network processors 

Stephen Melvin, Yale Patt 

October 2002 Proceedings of the 2002 international conference on Compilers, 

architecture, and synthesis for embedded systems CASES '02 
Publisher: ACM Press 

Additional Information: full citation , abstract, references , citings , index 



Full text available: ffi pdf(221.66 KB) 

terms 

Network processors are being asked to perform increasingly complex operations on 
packets of Information at faster and faster rates. Because processor performance and 
memory cycle times are not keeping up with this demand, there is a fundamental need for 
simultaneous processing of multiple packets, and the degree of this parallelism is. 
increasing. Sometimes a dependency exists between two packets currently being 
operated on, and as the ratio of packet processing time to packet transmission time 1 ... 

Keywords: memory synchronization, multithreaded processors, network processors, 
packet dependencies, packet processing, parallel processing, processor architecture, 
thread level speculation 
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proceedings on Communications architectures and protocols SIGCOMM 

'88, Volume 18 Issue 4 

Publisher: ACM Press 
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20 The effectiveness of affinity-based scheduling in multiprocessor network protocol 

processing (extended version) 

James D. Salehi, James F. Kurose, Don Towsley 

August 1996 IEEE/ ACM Transactions on Networking (TON), volume 4 issue 4 
Publisher: IEEE Press 

Full text available: ^ pdf(1.71 MB) Additional Information: full citation , references , citing s, index terms 
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